library(Seurat)
library(dplyr)
library(patchwork)
library(sctransform)
library(ggplot2)
library(ggpubr)
library(tidyr)
library(ggrepel)
library(celldex) # Cell annotation.
library(SingleR) # Cell annotation.
library(parallel) # detectCores()
library(future) # Allows parallelization in Seurat.
library(readODS) # Allows ods file import to add sample info
# Set up Seurat pararell computing.
options(parallelly.fork.enable = TRUE)
plan("multicore", workers = detectCores())
We recently found out that some clusters are exclusively formed by cells of a single run, indicanting the presence of batch effect.
p1 <- DimPlot(SCP_data, reduction = "umap", label = TRUE, repel = TRUE, label.size = 4) + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
p2 <- DimPlot(SCP_data, reduction = "umap", label = TRUE, repel = TRUE, label.size = 4, group.by = "Sample_Name") + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
p3 <- DimPlot(SCP_data, reduction = "umap", label = FALSE, repel = TRUE, label.size = 4, group.by = "orig.ident") + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
p1 + p2 + p3
Note: Cluster 4 is 100% run4 and cluster 15 is 100% run 3. They output an error and we don’t want to expend much time on this.
As we can see in the image, cluster 4 and cluster 15 is composed exclusively of cells from run4 and run3, respectively. Also, analyzing the proportions of each run throws an unequal distribution of cells from each batch.
clusters <- levels(SCP_data$seurat_clusters)
df <- data.frame()
for(i in 1:length(clusters)){
if (clusters[i] != 4 & clusters[i] != 14) {
cur_df <- as.data.frame(SCP_data@meta.data %>% subset(seurat_clusters == clusters[i]) %>% .$orig.ident %>% table() /
table(SCP_data$orig.ident))
cur_df$Freq <- cur_df$Freq * 1/(sum(cur_df$Freq))
cur_df$cluster <- clusters[i]
df <- rbind(df, cur_df)
} else {
cur_df <- as.data.frame(SCP_data@meta.data %>% subset(seurat_clusters == clusters[i]) %>% .$orig.ident %>% table())
cur_df$Freq <- cur_df$Freq * 1/(sum(cur_df$Freq))
cur_df$cluster <- clusters[i]
df <- rbind(df, cur_df)
}
}
RunFreq.noInt <- ggplot(df, aes(y=Freq, x=cluster, fill=.)) +
geom_bar(stat='identity') +
scale_y_continuous(expand = c(0,0)) +
ylab('normalized proportion') +
theme(
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
axis.text.x = element_text(angle=45, hjust=1),
axis.title.x = element_blank(),
legend.title = element_blank(),
axis.line.y = element_blank(),
axis.line.x = element_blank()
)
RunFreq.noInt
We also re-visualize the PCA plot:
Idents(SCP_data) <- "orig.ident"
DimPlot(SCP_data, reduction = "pca")
Seurat offers methods to integrate data from different batches (different sequencing runs, different samples, etc.). In v5, a new method called IntegrateLayers they easily perform the whole pipeline in a few steps. The first of them is to split the data according to the source of batch effect (in our case by run).
# The split method splits the given slot according to the indicated information.
SCP_data[["RNA"]] <- split(SCP_data[["RNA"]], f = SCP_data$orig.ident)
It is necessary to normalize the data before integration. According to maintainers (https://github.com/satijalab/seurat/issues/4811) it doesn’t mind to normalize before or after splitting.
Since recent discussion indicates that according to the use case sometimes is better to work on raw, non-integrated values (https://github.com/satijalab/seurat/discussions/5452?utm_source=pocket_saves), we also perform the standard normalization workflow.
# Standard normalization.
SCP_data <- NormalizeData(SCP_data)
SCP_data <- FindVariableFeatures(SCP_data, nfeatures = 3000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
SCP_data <- ScaleData(SCP_data)
We now run SCTransform, PCA and the integration.
# Launch sct transform and run PCA.
SCP_data <- SCTransform(SCP_data, vst.flavor = "v2", vars.to.regress = "percent.mt")
SCP_data <- RunPCA(SCP_data, assay = "SCT")
# Integration.
SCP_data <- IntegrateLayers(
object = SCP_data, method = HarmonyIntegration,
normalization.method = "SCT",
verbose = FALSE
)
There are several methods for integration, but according to some sources (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), the best ones are Harmony, Seurat’s CCA and scVII (requires setting up a conda environment and installing reticulate, see documentation). We choose Harmony for being well-integrated with Seruat, and performing pretty well with low computational effort.
If we re-visualize the PCA plot, we appreciate a bigger overlap between runs.
Idents(SCP_data) <- "orig.ident"
DimPlot(SCP_data, reduction = "pca")
While the elbow appears around the 15th PC, we extend the included PCs up to the 30th following the new guidelines by the Seurat developers after SCTransform normalization implementation.
ElbowPlot(SCP_data, ndims = 50, reduction = "pca") # choose 30
The heatmap and the feature plot give us information about the significance and composition of each PC.
DimHeatmap(SCP_data, dims = 1:30, balanced = TRUE, reduction = "pca")
VizDimLoadings(SCP_data, dims = 1:30, reduction = "pca")
WARING: Since the data is splitted into 4 layers, is not possible to plot the top variable genes, but we still can run FindVariableFeatures and access the list using VariableFeatrues.
# SCP_data <- FindVariableFeatures(SCP_data, selection.method = "vst", nfeatures = 3000, assay = "SCT")
# Genes with differential expression in different cells are good candidates to be biomarkers.
# Identify the 20 most highly variable genes
top <- head(VariableFeatures(SCP_data, assay = "SCT",layer = run), 20)
Clustering resolution fine-tuning is performed by a separated .Rmd file (Resolution fine-tuning.Rmd), giving 0.8 as a appropriate value for clustering.
It is important to notice that a new reduction is created called “harmony”. We need to indicate this as the reduction to use in the clustering.
# Resolution fine-tuned in a separate Rmd.
SCP_data <- FindNeighbors(SCP_data, dims = 1:30, reduction = "harmony")
SCP_data <- FindClusters(SCP_data, resolution = 0.8, ) # Default resolution = 0.8
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 13365
Number of edges: 492091
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9139
Number of communities: 24
Elapsed time: 1 seconds
SCP_data <- RunUMAP(SCP_data, dims = 1:30, reduction = "harmony")
Using method 'umap'
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
We now appreciate a more homogeneous distribution of the cells from each run among the clusters:
p1 <- DimPlot(SCP_data, reduction = "umap", label = TRUE, repel = TRUE, label.size = 4) + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
p2 <- DimPlot(SCP_data, reduction = "umap", label = TRUE, repel = TRUE, label.size = 4, group.by = "Sample_Name") + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
p3 <- DimPlot(SCP_data, reduction = "umap", label = FALSE, repel = TRUE, label.size = 4, group.by = "orig.ident") + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
p1 + p2 + p3
clusters <- levels(SCP_data$seurat_clusters)
df <- data.frame()
for(i in 1:length(clusters)){
cur_df <- as.data.frame(SCP_data@meta.data %>% subset(seurat_clusters == clusters[i]) %>% .$orig.ident %>% table() /
table(SCP_data$orig.ident))
cur_df$Freq <- cur_df$Freq * 1/(sum(cur_df$Freq))
cur_df$cluster <- clusters[i]
df <- rbind(df, cur_df)
}
RunFreq.Int <- ggplot(df, aes(y=Freq, x=cluster, fill=.)) +
geom_bar(stat='identity') +
scale_y_continuous(expand = c(0,0)) +
ylab('normalized proportion') +
theme(
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
axis.text.x = element_text(angle=45, hjust=1),
axis.title.x = element_blank(),
legend.title = element_blank(),
axis.line.y = element_blank(),
axis.line.x = element_blank()
)
RunFreq.Int
Proportions are a bit more equilibrated now.
RunFreq.noInt + RunFreq.Int
Cluster QC shows homogeneous characteristics across the clusters with the exception of cluster 22 (abnormal low number of features). Despite what it might look like, this actually conforms and advantage, as if that specific group of cells give inconsistent results, because they have been isolated in their own cluster we can move forward just ignoring or removing it.
# Visualize QC metrics as a violin plot
VlnPlot(SCP_data, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) + NoLegend()
# Zoom in on nFeature_RNA violin plot.
VlnPlot(SCP_data, features = "nFeature_RNA", ncol = 1) + ylim(0, 2500) + NoLegend()
# Visualize relationships in metadata to detect outliers with FeatureScatter function
plot1 <- FeatureScatter(SCP_data, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(SCP_data, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + NoLegend() + plot2 + NoLegend()
Cell-by-cell annotation identifies 30 different cell types based on the HumanPrimaryCellAtlas, which distribution pretty much coincides with the cluster layout.
For this step, we now need to join the layers before exporting to a SCExperiment object.
# Using celldex and SingleR packages.
# Download reference data from celldex.
reference <- HumanPrimaryCellAtlasData()
# Join layers before conversion. SingleR uses the RNA assay.
merged.SCexp <- JoinLayers(SCP_data, assay = "RNA")
# Convert Seurat object into a SingleCellExperiment Object for SingleR input.
merged.SCexp <- as.SingleCellExperiment(merged.SCexp)
SingleR.annotation <- SingleR(test = merged.SCexp, ref = reference, assay.type.test = "logcounts", labels = reference$label.main, num.threads = detectCores())
SCP_data[["cell.labels"]] <- SingleR.annotation$labels
DimPlot(SCP_data, reduction = "umap", group.by = "cell.labels", label = TRUE, repel = TRUE, label.size = 4) + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
We create our own composite cluster names by adding the cluster annotation to the cluster number provided by Suerat. Thus, what we see is an estimation of the cell type that better fits the genetic landscape of each cluster.
# Obtaining a vector containing the cluster of each cell in order.
# Get the factor contained in the SeuratObject with all this information.
clusters <- SCP_data@meta.data[["seurat_clusters"]]
# The cluster information for each cell is contain as a factor which levels coincide with the total number of clusters found by FindClusters(). An approach to transform this factor into a character vector is the following:
# Obtain the list of clusters with levels(clusters). This outputs a character vector containing the levles of the factor. After that, we use the factor itself as an index to access the levels vector. When using a factor as an index, R does not use the labels itself (which in this case are string, so if used as indexes would cause an error), but the internal numeric index the factor contains. That way, for each cluster label associated with a cell in the factor, we access its numeric index and map it to the levels vectors (which coincides), thus obtaining each cell label as an unique character value. Each cell label is then storage as a character (the as.character is added as a control method since SingleR only admits strings as labels) in a vector. The vector contains the cluster label for each cell as a character value in the same order as each cell appears in the dataset, so the by-cluster annotation doesn't assign the cells to an incorrect cluster.
clusters <- as.character(levels(clusters)[clusters])
# reference <- HumanPrimaryCellAtlasData()
# merged.SCexp <- as.SingleCellExperiment(SCP_data)
# We input the cluster vector using the clusters parameter.
SingleR.annotation <- SingleR(test = merged.SCexp, ref = reference, assay.type.test = "logcounts", labels = reference$label.main, clusters = clusters, num.threads = detectCores())
SCP_data[["cluster.labels"]] <- SingleR.annotation$labels
# We composite the cluster name. That way when 2 clusters' names are the same Seurat doesn't merge the labels.
# Get clusters levels accessing the SeuratObject variable as a df and then accessing the df as a column.
cluster_number <- levels(SCP_data[["seurat_clusters"]][1, ])
# Get annotation labels.
cluster_annotation <- SingleR.annotation$labels
# Since cluster levels and labels are in the same order, we composite the new names using paste0 (sort of equivalent to fstrings in python).
new.clusters.ids <- paste0(cluster_number, "-", cluster_annotation)
# Add names to each value of the clusters id vector so Seurat can take it as a valid input for RenameIdents.
names(new.clusters.ids) <- levels(SCP_data)
SCP_data <- RenameIdents(SCP_data, new.clusters.ids)
SCP_data[["cell.cluster.labels"]] <- Idents(SCP_data)
DimPlot(SCP_data, reduction = "umap", label = TRUE, repel = TRUE, label.size = 4) + theme(axis.title = element_text(size = 15), legend.text = element_text(size = 10), axis.text = element_text(size = 10)) + guides(colour = guide_legend(override.aes = list(size = 3)))
Here, we start the more deep and accurate analysis of the dataset.
We repeat this analysis to follow the new guidelines provided here: https://satijalab.org/seurat/articles/parsebio_sketch_integration
As recommended by Soneson et all. and Crowell et al., we use an aggregation-based (pseudobulk) workflow using the raw counts. We aggregate all cells within the same cell type and sample using the AggregateExpression function. This returns a Seurat object where each ‘cell’ represents the pseudobulk profile of one cell type in one individual.
First, we would take a look at the proportion of cells from the different groups (T, N, NAT and AT) on each cluster.
For that we will plot the frequency of each sample oAs recommended by Soneson et all. and Crowell et al., we use an aggregation-based (pseudobulk) workfln each cluster (that is, the fraction of the total cells of that given type present on each cluster).
# Sumarize() performs an operation over a row of a dataframe, for example mean() or count n().
# Slots are dataframes, so it can be used by dplyr.
prop_cell_by_sample <- SCP_data@meta.data %>% group_by(Sample_Name, cell.cluster.labels) %>%
summarise(n = n()) %>% # For each cluster, group the cells from the same sample together and count them.
ungroup() %>% group_by(Sample_Name) %>%
mutate(freq = n / sum(n)) %>% # Group them now by sample, add up the total number of cells from that sample (regardless of the cluster they belong to). Then, divide each n value (number of cells of a sample in a certain cluster), obtaining which fraction of the total cells of that given type is present on each cluster.
left_join(SCP_data@meta.data %>% select(Sample_Name, Sample_Group) %>% unique()) # Add metadata info available for the data, select only the Sample_Name and Sample_Group fields and delete duplicates.
For now, will be only focus on comparing the confirmed tumoral (T) and healthy (N) samples.
# Filter only the desired groups "N" and "T"
prop_cell_by_sample_filtered <- prop_cell_by_sample %>%
filter(Sample_Group %in% c("N", "T"))
We apply the Wilcoxon test to find out significant differences between sample groups. We found out that the cluster 18 (annotated as endothelial cells) is the only group with a significant enrichment in tumoral cells.
# Extract data for Sample_Group "T" and other groups
group_T <- prop_cell_by_sample[prop_cell_by_sample$Sample_Group == "T", ]
group_N <- prop_cell_by_sample[prop_cell_by_sample$Sample_Group == "N", ]
# Perform t-test for each cluster
clusters <- levels(prop_cell_by_sample$cell.cluster.labels)
p_values <- numeric(length(clusters))
for (i in seq_along(clusters)) {
cluster_data_T <- group_T[group_T$cell.cluster.labels == clusters[i], "freq"]
cluster_data_N <- group_N[group_N$cell.cluster.labels == clusters[i], "freq"]
# Perform test
test_result <- wilcox.test(cluster_data_T$freq, cluster_data_N$freq)
# Store p-value
p_values[i] <- test_result$p.value
}
# Identify clusters with significant enrichment (e.g., p-value < 0.05)
enriched_clusters <- clusters[p_values < 0.05]
# Print or visualize the enriched clusters
print(enriched_clusters)
character(0)
print(p_values)
[1] 1.00000000 0.49225774 0.95779221 0.36763237 0.79245754 0.09340659 0.41358641 0.79245754 0.23976024 0.63536464 0.71278721
[12] 0.95779221 0.18056943 0.32767233 0.27212787 0.95779221 0.94971695 0.11348651 0.21978022 0.60639361 0.81818182 0.55555556
[23] 1.00000000 0.16666667
Unfortunaletly, after applying integration, no cluster is significant. We plot the results along the statistics in a paper-ready figure.
# Plot only the desired groups
enriched_clusters <- ggboxplot(prop_cell_by_sample_filtered, x = "Sample_Group", y = "freq",
color = "Sample_Group", palette = "jco",
add = "jitter") +
facet_wrap(~cell.cluster.labels, scales = "free", nrow = 6) +
theme(legend.position = "none") +
xlab("Sample Group") + ylab("Frequency") +
stat_compare_means(aes(label = ..p.signif..), label.x = 1.5)
enriched_clusters
Anyway, in order to test this new method, we will now perform a DE analysis on cluster 18 by comparing cells coming from T samples (tumoral) vs healthy samples (N).
# By-cell method (uses integrated and SCTransformed data)
SCP_data <- PrepSCTFindMarkers(SCP_data)
cluster18.markers.cell <- FindMarkers(SCP_data, ident.1 = "T", ident.2 = "N", group.by = "Sample_Group", subset.ident = "18-T_cells", assay = "SCT", recorrect_umi = FALSE)
# Using pseudobulk.
bulk <- AggregateExpression(SCP_data, return.seurat = T, slot = "counts", assays = "RNA", group.by = c("seurat_clusters",
"Sample_Name", "Sample_Group"))
|
| | 0%
|
|= | 1%
|
|== | 1%
|
|== | 2%
|
|=== | 2%
|
|=== | 3%
|
|==== | 3%
|
|==== | 4%
|
|===== | 4%
|
|===== | 5%
|
|====== | 5%
|
|====== | 6%
|
|======= | 6%
|
|======== | 7%
|
|========= | 7%
|
|========= | 8%
|
|========== | 8%
|
|========== | 9%
|
|=========== | 9%
|
|=========== | 10%
|
|============ | 10%
|
|============ | 11%
|
|============= | 11%
|
|============= | 12%
|
|============== | 12%
|
|============== | 13%
|
|=============== | 13%
|
|================ | 14%
|
|================= | 14%
|
|================= | 15%
|
|================== | 15%
|
|================== | 16%
|
|=================== | 16%
|
|=================== | 17%
|
|==================== | 17%
|
|==================== | 18%
|
|===================== | 18%
|
|===================== | 19%
|
|====================== | 19%
|
|======================= | 20%
|
|======================== | 21%
|
|========================= | 21%
|
|========================= | 22%
|
|========================== | 22%
|
|========================== | 23%
|
|=========================== | 23%
|
|=========================== | 24%
|
|============================ | 24%
|
|============================ | 25%
|
|============================= | 25%
|
|============================= | 26%
|
|============================== | 26%
|
|=============================== | 27%
|
|================================ | 27%
|
|================================ | 28%
|
|================================= | 28%
|
|================================= | 29%
|
|================================== | 29%
|
|================================== | 30%
|
|=================================== | 30%
|
|=================================== | 31%
|
|==================================== | 31%
|
|==================================== | 32%
|
|===================================== | 32%
|
|===================================== | 33%
|
|====================================== | 33%
|
|======================================= | 34%
|
|======================================== | 34%
|
|======================================== | 35%
|
|========================================= | 35%
|
|========================================= | 36%
|
|========================================== | 36%
|
|========================================== | 37%
|
|=========================================== | 37%
|
|=========================================== | 38%
|
|============================================ | 38%
|
|============================================ | 39%
|
|============================================= | 39%
|
|============================================== | 40%
|
|=============================================== | 41%
|
|================================================ | 41%
|
|================================================ | 42%
|
|================================================= | 42%
|
|================================================= | 43%
|
|================================================== | 43%
|
|================================================== | 44%
|
|=================================================== | 44%
|
|=================================================== | 45%
|
|==================================================== | 45%
|
|==================================================== | 46%
|
|===================================================== | 46%
|
|====================================================== | 47%
|
|======================================================= | 47%
|
|======================================================= | 48%
|
|======================================================== | 48%
|
|======================================================== | 49%
|
|========================================================= | 49%
|
|========================================================= | 50%
|
|========================================================== | 50%
|
|========================================================== | 51%
|
|=========================================================== | 51%
|
|=========================================================== | 52%
|
|============================================================ | 52%
|
|============================================================ | 53%
|
|============================================================= | 53%
|
|============================================================== | 54%
|
|=============================================================== | 54%
|
|=============================================================== | 55%
|
|================================================================ | 55%
|
|================================================================ | 56%
|
|================================================================= | 56%
|
|================================================================= | 57%
|
|================================================================== | 57%
|
|================================================================== | 58%
|
|=================================================================== | 58%
|
|=================================================================== | 59%
|
|==================================================================== | 59%
|
|===================================================================== | 60%
|
|====================================================================== | 61%
|
|======================================================================= | 61%
|
|======================================================================= | 62%
|
|======================================================================== | 62%
|
|======================================================================== | 63%
|
|========================================================================= | 63%
|
|========================================================================= | 64%
|
|========================================================================== | 64%
|
|========================================================================== | 65%
|
|=========================================================================== | 65%
|
|=========================================================================== | 66%
|
|============================================================================ | 66%
|
|============================================================================= | 67%
|
|============================================================================== | 67%
|
|============================================================================== | 68%
|
|=============================================================================== | 68%
|
|=============================================================================== | 69%
|
|================================================================================ | 69%
|
|================================================================================ | 70%
|
|================================================================================= | 70%
|
|================================================================================= | 71%
|
|================================================================================== | 71%
|
|================================================================================== | 72%
|
|=================================================================================== | 72%
|
|=================================================================================== | 73%
|
|==================================================================================== | 73%
|
|===================================================================================== | 74%
|
|====================================================================================== | 74%
|
|====================================================================================== | 75%
|
|======================================================================================= | 75%
|
|======================================================================================= | 76%
|
|======================================================================================== | 76%
|
|======================================================================================== | 77%
|
|========================================================================================= | 77%
|
|========================================================================================= | 78%
|
|========================================================================================== | 78%
|
|========================================================================================== | 79%
|
|=========================================================================================== | 79%
|
|============================================================================================ | 80%
|
|============================================================================================= | 81%
|
|============================================================================================== | 81%
|
|============================================================================================== | 82%
|
|=============================================================================================== | 82%
|
|=============================================================================================== | 83%
|
|================================================================================================ | 83%
|
|================================================================================================ | 84%
|
|================================================================================================= | 84%
|
|================================================================================================= | 85%
|
|================================================================================================== | 85%
|
|================================================================================================== | 86%
|
|=================================================================================================== | 86%
|
|==================================================================================================== | 87%
|
|===================================================================================================== | 87%
|
|===================================================================================================== | 88%
|
|====================================================================================================== | 88%
|
|====================================================================================================== | 89%
|
|======================================================================================================= | 89%
|
|======================================================================================================= | 90%
|
|======================================================================================================== | 90%
|
|======================================================================================================== | 91%
|
|========================================================================================================= | 91%
|
|========================================================================================================= | 92%
|
|========================================================================================================== | 92%
|
|========================================================================================================== | 93%
|
|=========================================================================================================== | 93%
|
|============================================================================================================ | 94%
|
|============================================================================================================= | 94%
|
|============================================================================================================= | 95%
|
|============================================================================================================== | 95%
|
|============================================================================================================== | 96%
|
|=============================================================================================================== | 96%
|
|=============================================================================================================== | 97%
|
|================================================================================================================ | 97%
|
|================================================================================================================ | 98%
|
|================================================================================================================= | 98%
|
|================================================================================================================= | 99%
|
|================================================================================================================== | 99%
|
|===================================================================================================================| 100%
# We don't know why it adds a "g" after pseudobulking.
cluster18 <- subset(bulk, seurat_clusters == "g18")
# As we are using bulks instead of cells, we need to change the test to a more appropiate one.
cluster18.markers.bulk <- FindMarkers(cluster18, ident.1 = "T", ident.2 = "N", group.by = "Sample_Group", test.use = "DESeq2")
Plot cell-based DE:
FC = log2(1.5) # Set FC threshold.
# Add a column to the data frame to specify if they are UP- or DOWN- regulated (log2fc respectively positive or negative).
cluster18.markers.cell$diffexpressed <- "NO"
# if FC > 1.5 (log2FC > ~0.6) and pvalue < 0.05, set as "UP".
cluster18.markers.cell$diffexpressed[cluster18.markers.cell$avg_log2FC > FC & cluster18.markers.cell$p_val < 0.05] <- "UP"
# if FC < 0.666 (invers 1/1.5 calculated as log2FC < ~-0.6) and pvalue < 0.05, set as "DOWN".
cluster18.markers.cell$diffexpressed[cluster18.markers.cell$avg_log2FC < -FC & cluster18.markers.cell$p_val < 0.05] <- "DOWN"
cluster18.markers.cell <- cluster18.markers.cell %>%
arrange(p_val) %>%
mutate(gene_symbol = rownames(.)) %>%
group_by(diffexpressed) %>%
mutate(delabel = if_else(diffexpressed %in% c("UP", "DOWN") & row_number() <= 10, gene_symbol, NA)) %>%
ungroup()
cell.DE <- ggplot(cluster18.markers.cell, aes(x = avg_log2FC, y = -log10(p_val), col = diffexpressed, label = delabel)) +
geom_vline(xintercept = c(-FC, FC), col = "grey", linetype = 'dashed') +
geom_hline(yintercept = -log10(0.05), col = "grey", linetype = 'dashed') +
geom_point(size = 2) +
scale_color_manual(values = c("#00AFBB", "grey", "#bb0c00"), # to set the colours of our variable
labels = c("Downregulated", "Not significant", "Upregulated")) + # to set the labels in case we want to overwrite the categories from the dataframe (UP, DOWN, NO)
labs(color = NULL, #legend_title,
x = expression("log"[2]*"FC"), y = expression("-log"[10]*"p-value")) +
theme_classic() +
scale_x_continuous(breaks = seq(-10, 10, 2)) + # to customise the breaks in the x axis
geom_text_repel(box.padding = 0.6, max.overlaps = Inf) # To show all labels
Plot bulk-based DE:
# Add a column to the data frame to specify if they are UP- or DOWN- regulated (log2fc respectively positive or negative).
cluster18.markers.bulk$diffexpressed <- "NO"
# if FC > 1.5 (log2FC > ~0.6) and pvalue < 0.05, set as "UP".
cluster18.markers.bulk$diffexpressed[cluster18.markers.bulk$avg_log2FC > FC & cluster18.markers.bulk$p_val < 0.05] <- "UP"
# if FC < 0.666 (invers 1/1.5 calculated as log2FC < ~-0.6) and pvalue < 0.05, set as "DOWN".
cluster18.markers.bulk$diffexpressed[cluster18.markers.bulk$avg_log2FC < -FC & cluster18.markers.bulk$p_val < 0.05] <- "DOWN"
cluster18.markers.bulk <- cluster18.markers.bulk %>%
arrange(p_val_adj) %>%
mutate(gene_symbol = rownames(.)) %>%
group_by(diffexpressed) %>%
mutate(delabel = if_else(diffexpressed %in% c("UP", "DOWN") & row_number() <= 10, gene_symbol, NA)) %>%
ungroup()
bulk.DE <- ggplot(cluster18.markers.bulk, aes(x = avg_log2FC, y = -log10(p_val), col = diffexpressed, label = delabel)) +
geom_vline(xintercept = c(-FC, FC), col = "grey", linetype = 'dashed') +
geom_hline(yintercept = -log10(0.05), col = "grey", linetype = 'dashed') +
geom_point(size = 2) +
scale_color_manual(values = c("#00AFBB", "grey", "#bb0c00"), # to set the colours of our variable
labels = c("Downregulated", "Not significant", "Upregulated")) + # to set the labels in case we want to overwrite the categories from the dataframe (UP, DOWN, NO)
labs(color = NULL, #legend_title,
x = expression("log"[2]*"FC"), y = expression("-log"[10]*"p-value")) +
theme_classic() +
scale_x_continuous(breaks = seq(-10, 10, 2)) + # to customise the breaks in the x axis
geom_text_repel(box.padding = 0.6, max.overlaps = Inf) # To show all labels
Compare:
cell.DE + bulk.DE
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Madrid
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 stats graphics grDevices datasets utils methods base
other attached packages:
[1] readODS_2.2.0 future_1.33.2 SingleR_2.4.1 celldex_1.12.0
[5] SummarizedExperiment_1.32.0 Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
[9] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 MatrixGenerics_1.14.0
[13] matrixStats_1.3.0 ggrepel_0.9.5 tidyr_1.3.1 ggpubr_0.6.0
[17] ggplot2_3.5.1 sctransform_0.4.1 patchwork_1.2.0 dplyr_1.1.4
[21] Seurat_5.0.3 SeuratObject_5.0.1 sp_2.1-3
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.3 later_1.3.2 filelock_1.0.3
[5] bitops_1.0-7 tibble_3.2.1 polyclip_1.10-6 fastDummies_1.7.3
[9] lifecycle_1.0.4 rstatix_0.7.2 globals_0.16.3 lattice_0.22-6
[13] MASS_7.3-60.0.1 backports_1.4.1 magrittr_2.0.3 limma_3.58.1
[17] plotly_4.10.4 yaml_2.3.8 httpuv_1.6.15 glmGamPoi_1.14.3
[21] spam_2.10-0 spatstat.sparse_3.0-3 reticulate_1.36.1 cowplot_1.1.3
[25] pbapply_1.7-2 DBI_1.2.2 RColorBrewer_1.1-3 abind_1.4-5
[29] zlibbioc_1.48.2 Rtsne_0.17 presto_1.0.0 purrr_1.0.2
[33] RCurl_1.98-1.14 rappdirs_0.3.3 GenomeInfoDbData_1.2.11 irlba_2.3.5.1
[37] listenv_0.9.1 spatstat.utils_3.0-4 goftest_1.2-3 RSpectra_0.16-1
[41] spatstat.random_3.2-3 fitdistrplus_1.1-11 parallelly_1.37.1 DelayedMatrixStats_1.24.0
[45] leiden_0.4.3.1 codetools_0.2-20 DelayedArray_0.28.0 tidyselect_1.2.1
[49] farver_2.1.1 ScaledMatrix_1.10.0 BiocFileCache_2.10.1 spatstat.explore_3.2-7
[53] jsonlite_1.8.8 progressr_0.14.0 ggridges_0.5.6 survival_3.5-8
[57] tools_4.3.3 ica_1.0-3 Rcpp_1.0.12 glue_1.7.0
[61] gridExtra_2.3 SparseArray_1.2.4 DESeq2_1.42.1 xfun_0.43
[65] withr_3.0.0 BiocManager_1.30.22 fastmap_1.1.1 fansi_1.0.6
[69] rsvd_1.0.5 digest_0.6.35 R6_2.5.1 mime_0.12
[73] colorspace_2.1-0 scattermore_1.2 tensor_1.5 spatstat.data_3.0-4
[77] RSQLite_2.3.6 RhpcBLASctl_0.23-42 ggsci_3.0.3 utf8_1.2.4
[81] generics_0.1.3 data.table_1.15.4 httr_1.4.7 htmlwidgets_1.6.4
[85] S4Arrays_1.2.1 uwot_0.2.2 pkgconfig_2.0.3 gtable_0.3.5
[89] blob_1.2.4 lmtest_0.9-40 SingleCellExperiment_1.24.0 XVector_0.42.0
[93] htmltools_0.5.8.1 carData_3.0-5 dotCall64_1.1-1 scales_1.3.0
[97] png_0.1-8 harmony_1.2.0 knitr_1.46 rstudioapi_0.16.0
[101] reshape2_1.4.4 curl_5.2.1 nlme_3.1-164 zoo_1.8-12
[105] cachem_1.0.8 stringr_1.5.1 BiocVersion_3.18.1 KernSmooth_2.23-22
[109] vipor_0.4.7 miniUI_0.1.1.1 AnnotationDbi_1.64.1 ggrastr_1.0.2
[113] pillar_1.9.0 grid_4.3.3 vctrs_0.6.5 RANN_2.6.1
[117] promises_1.3.0 BiocSingular_1.18.0 car_3.1-2 beachmat_2.18.1
[121] dbplyr_2.5.0 xtable_1.8-4 cluster_2.1.6 beeswarm_0.4.0
[125] locfit_1.5-9.9 cli_3.6.2 compiler_4.3.3 rlang_1.1.3
[129] crayon_1.5.2 future.apply_1.11.2 ggsignif_0.6.4 labeling_0.4.3
[133] ggbeeswarm_0.7.2 plyr_1.8.9 stringi_1.8.3 BiocParallel_1.36.0
[137] viridisLite_0.4.2 deldir_2.0-4 munsell_0.5.1 Biostrings_2.70.3
[141] lazyeval_0.2.2 bspm_0.5.5 spatstat.geom_3.2-9 Matrix_1.6-5
[145] ExperimentHub_2.10.0 RcppHNSW_0.6.0 sparseMatrixStats_1.14.0 bit64_4.0.5
[149] statmod_1.5.0 KEGGREST_1.42.0 shiny_1.8.1.1 interactiveDisplayBase_1.40.0
[153] AnnotationHub_3.10.0 ROCR_1.0-11 igraph_2.0.3 broom_1.0.5
[157] memoise_2.0.1 bit_4.0.5
RNGkind()
[1] "Mersenne-Twister" "Inversion" "Rejection"